Zihao Mao
Cameron Matson
9/22/2017
For this lab we examine the images of the Stanford Dog Dataset. The dataset consits of ~20,000 images of dogs from 120 different breeds.
The dataset is primarily used for fine-grained classification problems, meaning that the instances are all members of the same main class and are divided by subclass. In this case, the main class is 'Dog' and the subclass is the breed: 'Beagle', 'Poodle', 'Lab'... These are potentially more difficult than standard classification problems because in theory all members of teh main class should at least share similar features. In other words as the saying goes "a dog is a dog is a dog not a cat."
Another challenge with this dataset is that there is that they do not depict a standard scene. These are not faces of dogs. These are not photoshoot photos of dogs. The images in the dataset are not even exclusively of dogs. Some contain multiple dogs or even people. The dataset would benefit from preprocessing in the form of some sort of standardization such that all the images are of the same kind, using facial detection for instance.
We imagine one potential use for the finegrained classification of dogs could be used in searching for lost pets. Imagine poor Susan has lost her precious Bichon Frise, Tutu. She goes to her local police station and demands that they check all of the town's traffic cameras for traces of Tutu. Well, they say there's hours of footage, and we don't want to look at it. Poor Susan. Now suppose there is a program that will "watch" the video and recognize when there is a four legged animal in view. The image could then be put through a classifier to detect if that 4 legged beast is a dog or a cat (or something else). Hooray! It's a dog! Now the image is put through a fine-grained classifier, which is able to tell that the dog IS in fact a Bichon Frise and not a Yorkshire Terrier. The police are then able to determine where Tutu is and Susan is very happy.
How well does a system like that need to work? Well each successive level probably does not need to be as precise as the last (and it likely won't be cause each successive level is more difficult than the last.) The key point is that a human (with some knowledge of dog breeds) would be close to perfect at identifying dogs, but with thousands of street cameras around, it would take them a long time to go through all the footage. Assuming you do a good job of identifying the dogs in the image you probably don't have to be that accurate at identifing the bichon frise. As long as you have as few false negatives as possible (so that you don't miss a potential bichon) you could probably get away with a few false positives.
# first we need to relabel the folders
#import os
#imagedir = '../../data/dogs/Images'
#for f in os.listdir(imagedir):
# if f[0] == '.': # stupid .DS_Store on mac
# continue
# if '-' in f:
# name = f.split('-',2)[1]
# os.renames(os.path.join(imagedir,f), os.path.join(imagedir,name))
#
#for f in os.listdir(imagedir):
# print(f)
# lets rename the images so its more readable
#for breed in os.listdir(imagedir):
# if breed[0] == '.': continue
# for img in os.listdir(os.path.join(imagedir,breed)):
# tail = img.split('_',2)[1]
# name = breed+'_'+tail
# os.rename(os.path.join(imagedir,breed,img), os.path.join(imagedir,breed,name))
import numpy as np
import os
import matplotlib.pyplot as plt
from scipy.misc import imresize
from skimage.color import rgb2gray
%matplotlib inline
imagedir = '../../data/dogs/Images'
There are 120 different breeds included in the dataset with bout 150 images of each breed for a total of 20,580 images. The images are stored in directories by breed. To make the size of the dataset more managable, we'll take a sample of 50 images from each 60 of the breed.
# remove dsstore
for d in os.listdir(imagedir):
if d.find('.DS') != -1:
os.remove(os.path.join(imagedir,d))
continue
for f in os.listdir(os.path.join(imagedir, d)):
if f.find('.DS') != -1:
os.remove(os.path.join(imagedir,d,f))
def load_images(num_samples, num_classes, h, w):
# preinitialize the matrix
img_arr = np.empty((num_samples*num_classes,h*w)) # 20 instances of each breed, each img will be 200x200 = 40000 pixels
label_arr = []
i = 0
# sample 60 breeds from the dataset
a = np.arange(len(os.listdir(imagedir)))
np.random.shuffle(a)
breed_sample_idxs = a[:num_classes]
for idx in breed_sample_idxs:
breed = os.listdir(imagedir)[idx]
if breed[0] == '.' :
continue # stupid ds.store on mac
print(int(i/num_samples),breed)
# sample 50 images from the breed
b = np.arange(len(os.listdir(os.path.join(imagedir,breed))))
np.random.shuffle(b)
img_sample_idxs = b[:num_samples]
for idx in img_sample_idxs:
dog_path = os.path.join(imagedir,breed,os.listdir(os.path.join(imagedir,breed))[idx])
if (dog_path.find('.DS') != -1) : continue # stupid ds.store on mac
img = plt.imread(dog_path)
# converts image to gray, resizes it to be 200x200, and then linearizes it
img_gray_resize_flat = rgb2gray(imresize(img, (h,w,3))).flatten()
img_arr[i] = img_gray_resize_flat
i = i + 1
# add name to list of labels
fname = dog_path.split('/')[-1] # 'dog_name_123497.jpg'
dog_name = fname[:fname.rfind('_')] # 'dog_name'
label_arr.append(dog_name)
return img_arr, label_arr
%%time
num_samples_per_breed = 50
num_breeds = 60
h=200
w=200
dogs, labels = load_images(num_samples=num_samples_per_breed, num_classes=num_breeds, h=h, w=w)
import pandas as pd
X = pd.DataFrame(dogs)
X
ex = dogs[0].reshape((200,200))
plt.imshow(ex, cmap='gray')
plt.title(labels[0])
plt.show()
# taken from Class Demo #4
def plot_gallery(images, titles, h, w, n_row=3, n_col=6):
"""Helper function to plot a gallery of portraits"""
plt.figure(figsize=(1.7 * n_col, 2.3 * n_row))
plt.subplots_adjust(bottom=0, left=.01, right=1.5, top=.90, hspace=.35)
# with slight modification
sample = np.random.randint(low=0, high=images.shape[0], size=n_row*n_col)
for i, idx in enumerate(sample):
plt.subplot(n_row, n_col, i + 1)
plt.imshow(images[idx].reshape((h, w)), cmap=plt.cm.gray)
plt.title(titles[idx], size=12)
plt.xticks(())
plt.yticks(())
plot_gallery(dogs, labels, 200, 200) # defaults to showing a 3 by 6 subset of the faces
Aren't they cute? The answer is yes. They are.
First, lets' find the maixum number of principle components required to have improvement on explained variance ratio.
from sklearn.decomposition import PCA
h = 200
w = 200
n_components = 3000
print ("Extracting the top %d eigenfaces from %d faces" % (
n_components, dogs.shape[0]))
pca = PCA(n_components=n_components)
%time pca.fit(dogs.copy())
eigenfaces = pca.components_.reshape((n_components, h, w))
def plot_explained_variance(pca):
import plotly
from plotly.graph_objs import Scatter, Marker, Layout, XAxis, YAxis, Bar, Line
plotly.offline.init_notebook_mode() # run at the start of every notebook
explained_var = pca.explained_variance_ratio_
cum_var_exp = np.cumsum(explained_var)
plotly.offline.iplot({
"data": [Bar(y=explained_var, name='individual explained variance'),
Scatter(y=cum_var_exp, name='cumulative explained variance')
],
"layout": Layout(xaxis=XAxis(title='Principal components'), yaxis=YAxis(title='Explained variance ratio'))
})
plot_explained_variance(pca)
According to the graph above, the number of components greater than 1000 will not have effective imporvement on the explained variance ratio anymore. Therefore, we decieded to take 1000 as the number of components for the PCA.
n_components = 1000
print ("Extracting the top %d eigenfaces from %d faces" % (
n_components, dogs.shape[0]))
pca = PCA(n_components=n_components)
%time pca.fit(dogs.copy())
eigenfaces = pca.components_.reshape((n_components, h, w))
eigenface_titles = ["eigenface %d" % i for i in range(eigenfaces.shape[0])]
# taken from Class Demo #4
def plot_gallery(images, titles, h, w, n_row=3, n_col=6):
"""Helper function to plot a gallery of portraits"""
plt.figure(figsize=(1.7 * n_col, 2.3 * n_row))
plt.subplots_adjust(bottom=0, left=.01, right=1.5, top=.90, hspace=.35)
sample = np.arange(n_row*n_col)
for i, idx in enumerate(sample):
plt.subplot(n_row, n_col, i + 1)
plt.imshow(images[idx].reshape((h, w)), cmap=plt.cm.gray)
plt.title(titles[idx], size=12)
plt.xticks(())
plt.yticks(())
plot_gallery(eigenfaces, eigenface_titles, h, w)
Then let's see how the images would look like after reconstructed from PCA.
# taken from Class Demo #4
def reconstruct_image(trans_obj,org_features):
low_rep = trans_obj.transform(org_features)
rec_image = trans_obj.inverse_transform(low_rep)
return low_rep, rec_image
dogs_to_reconstruct = 1
dogs_idx = dogs[dogs_to_reconstruct]
low_dimensional_representation, reconstructed_image = reconstruct_image(pca,dogs_idx.reshape(1, -1))
plt.subplot(1,3,1)
plt.imshow(dogs_idx.reshape((h, w)), cmap=plt.cm.gray)
plt.title('Original')
plt.grid()
plt.subplot(1,2,2)
plt.imshow(reconstructed_image.reshape((h, w)), cmap=plt.cm.gray)
plt.title('Reconstructed from Full PCA')
plt.grid()
As we can see friom the comparion, although the reconstructed image is less clear, it's very close to the orginal image. 1000 components has covered around 97% of the variance of the overall image dataset.
Lets make another try with random PCA which makes PCA of randomly selected samples, and make an comparison.
n_components = 1000
print ("Extracting the top %d eigenfaces from %d faces" % (
n_components, dogs.shape[0]))
rpca = PCA(n_components=n_components,svd_solver='randomized')
%time rpca.fit(dogs.copy())
eigenfaces = rpca.components_.reshape((n_components, h, w))
eigenface_titles = ["eigenface %d" % i for i in range(eigenfaces.shape[0])]
plot_gallery(eigenfaces, eigenface_titles, h, w)
dogs_to_reconstruct = 1
dogs_idx = dogs[dogs_to_reconstruct]
low_dimensional_representation, reconstructed_image_random = reconstruct_image(rpca,dogs_idx.reshape(1, -1))
low_dimensional_representation, reconstructed_image = reconstruct_image(pca,dogs_idx.reshape(1, -1))
plt.figure(figsize=(1.7 * 3, 2.3 * 1))
plt.subplots_adjust(bottom=0, left=.01, right=1.5, top=.90, hspace=.35)
#origin
plt.subplot(1,3,1)
plt.imshow(dogs_idx.reshape((h, w)), cmap=plt.cm.gray)
plt.title('Original')
plt.grid()
#Full PCA
plt.subplot(1,3,2)
plt.imshow(reconstructed_image.reshape((h, w)), cmap=plt.cm.gray)
plt.title(' Full PCA')
plt.grid()
#Random PCA
plt.subplot(1,3,3)
plt.imshow(reconstructed_image.reshape((h, w)), cmap=plt.cm.gray)
plt.title(' Random PCA')
plt.grid()
As we can see, the reconstructed images are very similar since they are using the same PCA approach.
Since the implementation of kernal PCA takes too long for the whole dataset, we choose to implement it only for a subgroud of the data, the 1000 images instead of 3000.We will take 20 images from each 50 classes.
dogs_sub, labels_sub = load_images(num_samples=20, num_classes=50, h=h, w=w)
dogs_sub.shape
%%time
from sklearn.decomposition import KernelPCA
n_components = 500
print ("Extracting the top %d eigenfaces from %d faces" % (n_components, dogs_sub.shape[0]))
kpca = KernelPCA(n_components=n_components, kernel='rbf',
fit_inverse_transform=True, gamma=15, # very sensitive to the gamma parameter,
remove_zero_eig=True)
kpca.fit(dogs_sub.copy())
To make a comparision with linear dimensionality reduction, lets also do a regular full PCA with the subgroup.
n_components = 1000
print ("Extracting the top %d eigenfaces from %d faces" % (
n_components, dogs.shape[0]))
pca = PCA(n_components=n_components)
%time pca.fit(dogs_sub.copy())
plot_explained_variance(pca)
It seems like the PCA will approch the maximum performance around 600 number of components. In order to see the difference between the performance, we need a smaller number. In this case, we choose 500 to better see the difference.
n_components = 500
print ("Extracting the top %d eigenfaces from %d faces" % (
n_components, dogs.shape[0]))
pca = PCA(n_components=n_components)
%time pca.fit(dogs_sub.copy())
# taken from Class Demo #4
import warnings
warnings.simplefilter('ignore', DeprecationWarning)
from ipywidgets import widgets
def plt_reconstruct(dogs_to_reconstruct):
reconstructed_image = pca.inverse_transform(pca.transform(dogs_sub[dogs_to_reconstruct].reshape(1, -1)))
reconstructed_image_kpca = kpca.inverse_transform(kpca.transform(dogs_sub[dogs_to_reconstruct].reshape(1, -1)))
plt.figure(figsize=(15,7))
plt.subplot(1,3,1)
plt.imshow(dogs_sub[dogs_to_reconstruct].reshape((h, w)), cmap=plt.cm.gray)
plt.title("Origin")
plt.grid()
plt.subplot(1,3,2)
plt.imshow(reconstructed_image.reshape((h, w)), cmap=plt.cm.gray)
plt.title('Full PCA with 500 n_comp')
plt.grid()
plt.subplot(1,3,3)
plt.imshow(reconstructed_image_kpca.reshape((h, w)), cmap=plt.cm.gray)
plt.title('Kernel PCA with 500 n_comp')
plt.grid()
widgets.interact(plt_reconstruct,dogs_to_reconstruct=100,__manual=True)
According to the comparsion above, with the same number of components, kernal PCA would do a much better job than regular full PCA. The image quality of reconstructed by kernal PCA is close to the origin, while the images reconsturcted from regular PCA is still unclear in some cases. The time comparision now is 4 minutes of kernal PCA and 8 seconds of regular PCA.
Let's see the comparison when we raise the number of components to maximum for regular PCA.
n_components = 1000
print ("Extracting the top %d eigenfaces from %d faces" % (
n_components, dogs_sub.shape[0]))
pca = PCA(n_components=n_components)
%time pca.fit(dogs_sub.copy())
def plt_reconstruct_max(dogs_to_reconstruct):
reconstructed_image = pca.inverse_transform(pca.transform(dogs_sub[dogs_to_reconstruct].reshape(1, -1)))
reconstructed_image_kpca = kpca.inverse_transform(kpca.transform(dogs_sub[dogs_to_reconstruct].reshape(1, -1)))
plt.figure(figsize=(15,7))
plt.subplot(1,3,1)
plt.imshow(dogs_sub[dogs_to_reconstruct].reshape((h, w)), cmap=plt.cm.gray)
plt.title("Origin")
plt.grid()
plt.subplot(1,3,2)
plt.imshow(reconstructed_image.reshape((h, w)), cmap=plt.cm.gray)
plt.title('Full PCA with 1000 n_comp')
plt.grid()
plt.subplot(1,3,3)
plt.imshow(reconstructed_image_kpca.reshape((h, w)), cmap=plt.cm.gray)
plt.title('Kernel PCA with 200 n_comp')
plt.grid()
widgets.interact(plt_reconstruct_max,dogs_to_reconstruct=100,__manual=True)
For some reason, the implementation takes less time for regular PCA to reach it's maximum performance. The kernal PCA requires less number of components, but it's also taking much more time for processing. It takes 4 minutes for kernal PCA to reach the wanted performance while the regular full PCA only needs 5 seconds. Therefore, we prefer regular full PCA to do the dimensionality reduction since it takes less time. It took more than an hour for kernal PCA's implementation when we was trying to apply that for the 3000 images.
Let's start by doing a simple edge detection using the gradient (a.k.a a sobel filter)
from skimage.filters import sobel_h, sobel_v
idx_to_reconstruct = int(np.random.rand(1)*len(dogs))
img = dogs[idx_to_reconstruct].reshape((h,w))
plt.figure(figsize=(15,30))
plt.subplot(1,4,1)
plt.imshow(img, cmap='gray')
plt.title(labels[idx_to_reconstruct]+' - original')
plt.subplot(1,4,2)
plt.imshow(sobel_v(img,), cmap='gray')
plt.title('v.sobel filter')
plt.subplot(1,4,3)
plt.imshow(sobel_h(img), cmap='gray')
plt.title('h.sobel filter')
plt.subplot(1,4,4)
gradient_mag = np.sqrt(sobel_v(img)**2 + sobel_h(img)**2 )
plt.imshow(gradient_mag, cmap='gray')
plt.title('gradient [v^2+h^2]^1/2')
plt.show()
Let's take the gradient of each image in the dataset and see if we can use it to classify the breed. Or at least get similar looking images...
def take_gradient(row, shape):
img = row.reshape(shape)
gradient_mag = np.sqrt(sobel_v(img)**2 + sobel_h(img)**2 )
return gradient_mag.reshape(-1)
# case
%time take_gradient(dogs[0], ((h,w))).shape
%time grad_features = np.apply_along_axis(take_gradient, 1, dogs, (h,w))
print(grad_features.shape)
Let's take a quick look at some of these
plot_gallery(grad_features, labels, h, w)
It seems to be a pretty good edge detector, but because 1) there is a lot of noise in the images and 2) the dogs are in many different poses it probably isn't very good as a feature for classifying
from sklearn.metrics.pairwise import pairwise_distances
# find the pairwise distance between all the different image features
%time dist_matrix = pairwise_distances(grad_features)
import copy
# find closest image to current image
idx1 = np.random.randint(0,len(dogs))
distances = copy.deepcopy(dist_matrix[idx1,:])
distances[idx1] = np.infty # dont pick the same image!
idx2 = np.argmin(distances)
plt.figure(figsize=(10,10))
plt.subplot(2,2,1)
plt.imshow(dogs[idx1].reshape((h,w)), cmap='gray')
plt.title("Original Image - " + labels[idx1])
plt.subplot(2,2,2)
plt.imshow(dogs[idx2].reshape((h,w)), cmap='gray')
plt.title("Closest Image - " + labels[idx2])
plt.subplot(2,2,3)
plt.imshow(grad_features[idx1].reshape((h,w)), cmap='gray')
plt.title("Original Image gradient")
plt.subplot(2,2,4)
plt.imshow(grad_features[idx2].reshape((h,w)), cmap='gray')
plt.title("Closest Image gradient")
plt.show()
This method doesn't work very well for this dataset since its extremely sensitive position of the object in the image. If two images are very "close" to one another using this method it is more likely that subject of the image are in similar positions, rather than the subjects being similar to one another. For example consider the match from one iteration below:

No one would ever mistake a miniature schnauzer for an Irish setter. However, in these particular images the two dogs are both forward facing, approximately the same size relative to the size of the image, and pictured with a grassy background. Thus, by their gradients, the images are similar.
This method is essentially the same as a pixel wise comparisson. If an edge (i.e. high gradient intensity) occurs in one pixel in image A and appears just one pixel over in image B, the distance (as computed by
We can illustrate this by looking at a heatmap of the pairwise distance of the gradients:
import seaborn as sns
plt.figure(figsize=(10,9))
ax = sns.heatmap(dist_matrix[:200,:200], cmap='magma')
ax.set_xticks(np.arange(0,200,50))
ax.set_xticks(np.arange(0,200,10), minor=True)
ax.set_yticks(np.arange(0,200,50))
ax.set_yticks(np.arange(0,200,10), minor=True)
ax.set_xticklabels([*labels[0:200:50]])
ax.set_xticklabels(np.arange(0,200,10), minor=True)
ax.set_yticklabels([*labels[0:200:50]])
ax.set_yticklabels(np.arange(0,200,10), minor=True)
ax.grid(markevery=5, lw=4,color='black')
ax.set_title('Pariwise Distance of Gradient by Class')
plt.show()
This heatmap shows the pairwise distance between the instances of the first four breeds in the dataset. If minimizing the distance in the gradient were any good as a classifier, one would expect there to be significantly darker colors within each major square along the diagnol vs. the rest of the grid. Since each square in the 4x4 grid has roughly the same distribution of colors, we can conclude that this is not a good classifier.
What if we look at those gradients by class.
plt.figure(figsize=(15,25))
plt.subplot(1,5,1)
breedA = grad_features[0:50]
average_breedA = np.apply_along_axis(func1d=np.mean, arr=breedA, axis=0).reshape((h,w))
plt.imshow(average_breedA)
plt.title(labels[0])
plt.subplot(1,5,2)
breedB = grad_features[51:100]
average_breedB = np.apply_along_axis(func1d=np.mean, arr=breedB, axis=0).reshape((h,w))
plt.imshow(average_breedB)
plt.title(labels[51])
plt.subplot(1,5,3)
breedC = grad_features[101:150]
average_breedC = np.apply_along_axis(func1d=np.mean, arr=breedC, axis=0).reshape((h,w))
plt.imshow(average_breedC)
plt.title(labels[101])
plt.subplot(1,5,4)
breedD = grad_features[151:200]
average_breedD = np.apply_along_axis(func1d=np.mean, arr=breedD, axis=0).reshape((h,w))
plt.imshow(average_breedD)
plt.title(labels[151])
plt.subplot(1,5,5)
allFour = grad_features[:200]
average_allFour = np.apply_along_axis(func1d=np.median, arr=allFour, axis=0).reshape((h,w))
plt.imshow(average_allFour)
plt.title('All Four')
There clearly isn't much congruence among the images, which we knew already. When we average the gradient over a class, it essentially looks like white noise, not much diffrent from when we take the average gradient over all four of them.
from skimage.feature import hog
# lets first visualize what the daisy descripto looks like
features, img_desc = hog(image=img, block_norm='L2-Hys',visualise=True)
plt.figure(figsize=(10,10))
plt.subplot(1,2,1)
plt.imshow(img)
plt.subplot(1,2,2)
plt.imshow(img_desc, cmap='gray')
plt.grid()
def apply_hog(row,shape):
feat = hog(row.reshape(shape), block_norm='L2-Hys')
return feat.reshape((-1))
%time test_feature = apply_hog(dogs[3],(h,w))
test_feature.shape
# apply to entire data, row by row,
%time hog_features = np.apply_along_axis(apply_hog, 1, dogs, (h,w))
print(hog_features.shape)
from sklearn.metrics.pairwise import pairwise_distances
# find the pairwise distance between all the different image features
%time dist_matrix = pairwise_distances(hog_features)
import copy
# find closest image to current image
idx1 = np.random.randint(len(dogs))
distances = copy.deepcopy(dist_matrix[idx1,:])
distances[idx1] = np.infty # dont pick the same image!
idx2 = np.argmin(distances)
plt.figure(figsize=(7,10))
plt.subplot(2,2,1)
plt.imshow(dogs[idx1].reshape((h,w)), cmap='gray')
plt.title("Original Image - "+labels[idx1])
plt.grid()
plt.subplot(2,2,2)
plt.imshow(dogs[idx2].reshape((h,w)), cmap='gray')
plt.title("Closest Image - "+labels[idx2])
plt.grid()
plt.figure(figsize=(10,9))
ax = sns.heatmap(dist_matrix[:200,:200], cmap='magma')
ax.set_xticks(np.arange(0,200,50))
ax.set_xticks(np.arange(0,200,10), minor=True)
ax.set_yticks(np.arange(0,200,50))
ax.set_yticks(np.arange(0,200,10), minor=True)
ax.set_xticklabels([*labels[0:200:50]])
ax.set_xticklabels(np.arange(0,200,10), minor=True)
ax.set_yticklabels([*labels[0:200:50]])
ax.set_yticklabels(np.arange(0,200,10), minor=True)
ax.grid(markevery=5, lw=4,color='black')
ax.set_title('Pariwise Distance of HOG by Class')
plt.show()
Let's see if using the DAISY method is any more effective as a classifier.
from skimage.feature import daisy
# lets first visualize what the daisy descripto looks like
features, img_desc = daisy(img,step=40, radius=10, rings=3, histograms=5, orientations=8, visualize=True)
plt.imshow(img_desc, cmap='gray')
plt.grid()
# create a function to tak in the row of the matric and return a new feature
def apply_daisy(row,shape):
feat = daisy(row.reshape(shape),step=10, radius=10, rings=2, histograms=6, orientations=8, visualize=False)
return feat.reshape((-1))
%time test_feature = apply_daisy(dogs[3],(h,w))
test_feature.shape
# apply to entire data, row by row,
# takes about a minute to run
%time daisy_features = np.apply_along_axis(apply_daisy, 1, dogs, (h,w))
print(daisy_features.shape)
from sklearn.metrics.pairwise import pairwise_distances
# find the pairwise distance between all the different image features
%time dist_matrix = pairwise_distances(daisy_features)
import copy
# find closest image to current image
idx1 = np.random.randint(len(dogs))
distances = copy.deepcopy(dist_matrix[idx1,:])
distances[idx1] = np.infty # dont pick the same image!
idx2 = np.argmin(distances)
plt.figure(figsize=(7,10))
plt.subplot(1,2,1)
plt.imshow(dogs[idx1].reshape((h,w)), cmap='gray')
plt.title("Original Image - "+labels[idx1])
plt.grid()
plt.subplot(1,2,2)
plt.imshow(dogs[idx2].reshape((h,w)), cmap='gray')
plt.title("Closest Image - "+labels[idx2])
plt.grid()
Hey! It actually got one right!

However these two images are of the curly in VERY similar poses
Now this is interesting

It looks like the outline of the branches is most similar to this very long haired dog.
On inspection it looks like this does at least a little better than the gradient method, but it still seems to be working on the context of the image more than the content in it, i.e. the potition and orientation of the dog rather than the characteristics of the dog itself
plt.figure(figsize=(10,9))
ax = sns.heatmap(dist_matrix[:200,:200], cmap='magma')
ax.set_xticks(np.arange(0,200,50))
ax.set_xticks(np.arange(0,200,10), minor=True)
ax.set_yticks(np.arange(0,200,50))
ax.set_yticks(np.arange(0,200,10), minor=True)
ax.set_xticklabels([*labels[0:200:50]])
ax.set_xticklabels(np.arange(0,200,10), minor=True)
ax.set_yticklabels([*labels[0:200:50]])
ax.set_yticklabels(np.arange(0,200,10), minor=True)
ax.grid(markevery=5, lw=4,color='black')
ax.set_title('Pariwise Distance of DAISY by Class')
plt.show()
When we look at the same heatmap as before we see that overall the distances are reduced (there seems to also be a factor of 10 in the daisy computation) but again there is no reall difference between classes.